1 research outputs found

    Treatment Outcome Prediction in Locally Advanced Cervical Cancer: A Machine Learning Approach using Feature Selection on Multi-Source Data

    Get PDF
    Cancer is a significant global health issue, and cervical cancer, one of the most common types among women, has far-reaching impacts worldwide. Researchers are studying cervical cancer from various perspectives, conducting thorough investigations, and utilizing novel technologies to gain a deeper understanding of the disease and its risk factors. Machine learning has increasingly found applications in cancer research due to its ability to analyze complex data relationships, recognize patterns, adapt to new information, and integrate with other technologies. By harnessing predictive machine learning models to anticipate treatment outcomes before commencing any therapies, healthcare providers might be able to make more informed decisions, allocate resources effectively, and provide personalized care. Despite significant efforts in the scientific community, the development of accurate machine learning models for cervical cancer treatment outcome prediction faces several open challenges and unresolved questions. A major challenge in developing accurate prediction models is the limited availability and quality of data. The quantity and quality of data differ across various datasets, which can significantly affect the performance and applicability of machine learning models. Additionally, it is crucial to identify the most informative and relevant features from diverse data sources, including clinical, imaging, and molecular data, to ensure accurate outcome prediction. Moreover, cancer datasets often suffer from class imbalance. Addressing this issue is another essential step to prevent biased predictions and enhance the overall performance of the models. This study aims to improve the prediction of treatment outcomes in patients with locally advanced cervical cancer by utilizing a multi-source dataset and developing different machine-learning models. The dataset includes various data sources, such as medical images, gene scores, and clinical data. A preprocessing pipeline is developed to optimize the data for training machine-learning models. The Repeated Elastic Net Technique (RENT) is also employed as a feature selection method to reduce dataset dimensionality, improve model training time, and identify the most influential features for classifying patients' treatment results. Furthermore, the Synthetic Minority Oversampling Technique (SMOTE) is used to address data imbalance in the dataset, and its impact on model performance is assessed. The study's findings indicate that the available data exhibit promising capabilities in early predicting patients' treatment outcomes, suggesting that the developed models have the potential to serve as valuable auxiliary tools for medical professionals. Although the performance of the models remained relatively unchanged after implementing the RENT method, the models' average training time was reduced by over 8-fold in the worst case. Moreover, when imposing stricter feature selection criteria, clinical features were shown to have a more prominent role in predicting treatment results than other data sources. Ultimately, the study revealed that by balancing the dataset using the SMOTE technique, the average performance of specific models could be enhanced by up to 44 times
    corecore